A New Torus-like Mapping for Parallel Sparse Matrix Factorization

نویسنده

Andrew J. Cleary

چکیده

In Cle93] we describe a new mapping of sparse matrices to the processors of a distributed memory parallel computer, called the sparse torus wrap mapping (STWM), designed to reduce the volume of interprocessor communication during the Cholesky factorization A = LL T. The mapping combines the advantages of the so-called dense torus wrap mapping (DTWM) Ash91b] developed for dense matrix factorizations with those of domain decomposition-type column oriented mappings GLN89] developed to take advantage of the structure of sparse matrices. The paper Cle93] was mainly concerned with a classical sparse matrix model problem: a k k grid with a nine-point stencil with n = k 2 > p unknowns ordered by nested dissection Geo73]. The elimination tree GL81] corresponding to the model problem is extremely well balanced and thus not indicative of typical problems. In this paper we extend the mapping to arbitrarily unbalanced elimination trees by use of a proportional mapping scheme, and thus show the utility of the mapping for general problems. We present a simple algorithm that automatically performs the assignment of elements to processors' memories. The main theoretical quantity of interest in this work is the total communication volume (the amount of oating-point numbers that must be communicated from one processor's physical memory to another processor). If the communication volume for an algorithm is greater in an order-of-magnitude sense than total oating point operations, then that algorithm is not scalable GMB88]. It was shown theoretically Sch92] that no mapping of the columns would make a column oriented algorithm for the model problem scalable. For the model problem, GHLN88] gives a straightforward fan-out algorithm whose resultant communication volume is O(pk 2 log 2 k). By changing the mapping of the columns to a domain decomposition-type subtree to subset mapping, GLN89] were able to modify their fan-out algorithm to require O(pk 2) communication volume in O(pk log k) messages. For a restricted family of column oriented algorithms, they showed that their communication volume was optimal. For the case p < k, HZ91] reduced the number of messages to O(pk log p) but retained the same communication volume. MR92] further reduced the number of messages to O(pk), but reduced only the constant for the communication volume. Ash91a] argued that by violating the assumption of MR92] there existed a column oriented algorithm that would lower the communication volume to O(p 1=2 k 2) but at the cost of increasing the …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Voice-based Age and Gender Recognition using Training Generative Sparse Model

Abstract: Gender recognition and age detection are important problems in telephone speech processing to investigate the identity of an individual using voice characteristics. In this paper a new gender and age recognition system is introduced based on generative incoherent models learned using sparse non-negative matrix factorization and atom correction post-processing method. Similar to genera...

متن کامل

A High Performance Sparse Cholesky Factorization Algorithm For Scalable Parallel Computers∗†

This paper presents a new parallel algorithm for sparse matrix factorization. This algorithm uses subforest-to-subcube mapping instead of the subtree-to-subcube mapping of another recently introduced scheme by Gupta and Kumar [10]. Asymptotically, both formulations are equally scalable on a wide range of architectures and a wide variety of problems. But the subtree-to-subcube mapping of the ear...

متن کامل

A High Performance Sparse Cholesky Factorization Algorithm

This paper presents a new parallel algorithm for sparse matrix factorization. This algorithm uses subforest-to-subcube mapping instead of the subtree-to-subcube mapping of another recently introduced scheme by Gupta and Kumar 13]. Asymptotically, both formulations are equally scalable on a wide range of architectures and a wide variety of problems. But the subtree-to-subcube mapping of the earl...

متن کامل

Iterative Weighted Non-smooth Non-negative Matrix Factorization for Face Recognition

Non-negative Matrix Factorization (NMF) is a part-based image representation method. It comes from the intuitive idea that entire face image can be constructed by combining several parts. In this paper, we propose a framework for face recognition by finding localized, part-based representations, denoted “Iterative weighted non-smooth non-negative matrix factorization” (IWNS-NMF). A new cost fun...

متن کامل

A PERFORMANCE STUDY OF SPARSE CHOLESKY FACTORIZATION ON INTEL iPSC/860

The problem of Cholesky factorization of a sparse matrix has been very well investigated on sequential machines. A number of efficient codes exist for factorizing large unstructured sparse matrices, for example, codes from Harwell Subroutine Library [4] and Sparspak [7]. However, there is a lack of such efficient codes on parallel machines in general, and distributed memory machines in particul...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1993

A New Torus-like Mapping for Parallel Sparse Matrix Factorization

نویسنده

چکیده

منابع مشابه

Voice-based Age and Gender Recognition using Training Generative Sparse Model

A High Performance Sparse Cholesky Factorization Algorithm For Scalable Parallel Computers∗†

A High Performance Sparse Cholesky Factorization Algorithm

Iterative Weighted Non-smooth Non-negative Matrix Factorization for Face Recognition

A PERFORMANCE STUDY OF SPARSE CHOLESKY FACTORIZATION ON INTEL iPSC/860

عنوان ژورنال:

اشتراک گذاری